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Abstract 

Bounded treewidth and Monadic Second Order (MSO) logic have proved to be key concepts in estab- 
lishing fixed-parameter tractability results. Indeed, by Courcelle's Theorem we know: Any property of 
finite structures, which is expressible by an MSO sentence, can be decided in linear time (data complex- 
ity) if the structures have bounded treewidth. In principle, Courcelle's Theorem can be applied directly to 
construct concrete algorithms by transforming the MSO evaluation problem into a tree language recogni- 
tion problem. The latter can then be solved via a finite tree automaton (FTA). However, this approach has 
turned out to be problematical, since even relatively simple MSO formulae may lead to a "state explosion" 
of the FTA. 

In this work we propose monadic datalog (i.e., datalog where all intentional predicate symbols are 
unary) as an alternative method to tackle this class of fixed-parameter tractable problems. We show that if 
some property of finite structures is expressible in MSO then this property can also be expressed by means 
of a monadic datalog program over the structure plus the tree decomposition. Moreover, we show that the 
resulting fragment of datalog can be evaluated in linear time (both w.r.t. the program size and w.r.t. the 
data size). This new approach is put to work by devising new algorithms for the 3-Colorability problem of 
graphs and for the PRIMALITY problem of relational schemas (i.e., testing if some attribute in a relational 
schema is part of a key). We also report on experimental results with a prototype implementation. 



1 Introduction 

Over the past decade, parameterized complexity has evolved as an important subdiscipline in the field of 
computational complexity, see |[8l[T4|. In particular, it has been shown that many hard problems become 
tractable if some problem parameter is fixed or bounded by a constant. In the arena of graphs and, more 
generally, of finite structures, the treewidth is one such parameter which has served as the key to many fixed- 
parameter tractability (FPT) results. The most prominent method for establishing the FPT in case of bounded 
treewidth is via Courcelle's Theorem, see Q: Any property of finite structures, which is expressible by a 

*Thi.s is an extended and enhanced version of results published in [19]. The work was partially supported by the Austrian Science 
Fund (FWF), project P20704-N18. 

tWork performed while the author was with Technische Universitat Wien. 
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Monadic Second Order (MSO) sentence, can be decided in linear time (data complexity) if the treewidth of 
the structures is bounded by a fixed constant. 

Recipes as to how one can devise concrete algorithms based on Courcelle's Theorem can be found in 
the literature, see i2l [T3l . The idea is to first translate the MSO evaluation problem over finite structures into 
an equivalent MSO evaluation problem over colored binary trees. This problem can then be solved via the 
correspondence between MSO over trees and finite tree automata (FTA), see [|29l i6l . In theory, this generic 
method of turning an MSO description into a concrete algorithm looks very appealing. However, in practice, 
it has turned out that even relatively simple MSO formulae may lead to a "state explosion" of the FTA, see 
|fT5ll26l . Consequently, it was already stated in |21 1 that the algorithms derived via Courcelle's Theorem are 
"useless for practical applications". The main benefit of Courcelle's Theorem is that it provides "a simple 
way to recognize a property as being linear time computable". In other words, proving the FPT of some 
problem by showing that it is MSO expressible is the starting point (rather than the end point) of the search 
for an efficient algorithm. 

In this work we propose monadic datalog (i.e., datalog where all intensional predicate symbols are 
unary) as a practical tool for devising efficient algorithms in situations where the FPT has been shown 
via Courcelle's Theorem. Above all, we prove that if some property of finite structures is expressible in 
MSO then this property can also be expressed by means of a monadic datalog program over the structure 
plus the tree decomposition. Hence, in the first place, we prove an expressivity result rather than a mere 
complexity result. However, we also show that the resulting fragment of datalog can be evaluated in linear 
time (both w.rt. the program size and w.r.t. the data size). We thus get the corresponding complexity result 
(i.e., Courcelle's Theorem) as a corollary of this MSO-to-datalog transformation. 

Our MSO-to-datalog transformation for finite structures with bounded treewidth generalizes a result 
from llT6l where it was shown that MSO on trees has the same expressive power as monadic datalog on 
trees. Several obstacles had to be overcome to prove this generalization: 

• First of all, we no longer have to deal with a single universe, namely the universe of trees whose 
domain consists of the tree nodes. Instead, we now have to deal with - and constantly switch between 
- two universes, namely the relational structure (with its own signature and its own domain) on the 
one hand, and the tree decomposition (with appropriate predicates expressing the tree structure and 
with the tree nodes as a separate domain) on the other hand. 

• Of course, not only the MSO-to-datalog transformation itself had to be lifted to the case of two 
universes. Also important prerequisites of the results in llT6l (notably several results on MSO- 
equivalences of tree structures shown in f28l) had to be extended to this new situation. 

• Apart from switching between the two universes, it is ultimately necessary to integrate both universes 
into the monadic datalog program. For this purpose, both the signature and the domain of the finite 
structure have to be appropriately extended. 

• It has turned out that previous notions of standard or normal forms of tree decompositions (see fF' TSI) 
are not suitable for our purposes. We therefore have to introduce a modified version of "normalized 
tree decompositions", which is then further refined as we present new algorithms based on monadic 
datalog. 

In the second part of this paper, we put monadic datalog to work by presenting new algorithms for the 3- 
Colorability problem of graphs and for the PRIMALITY problem of relational schemas (i.e., testing if some 
attribute in a relational schema is part of a key). Both problems are well-known to be intractable (e.g., see 
II25I for PRIMALITY). It is folklore that the 3-Colorability problem can be expressed by an MSO sentence. 
In fTSJ, it was shown that PRIMALITY is MSO expressible. Hence, in case of bounded treewidth, both 
problems become tractable. However, two attempts to tackle these problems via the standard MSO-to-FTA 
approach turned out to be very problematical: We experimented with a prototype implementation using 
MONA (see |22J) for the MSO model checking, but we ended up with "out-of-memory" errors already for 
really small input data (see Section |6ll. Alternatively, we made an attempt to directly implement the MSO- 
to-FTA mapping proposed in ifTSl . However, the "state explosion" of the resulting FTA - which tends to 
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occur already for comparatively simple formulae (cf. [26 1) - led to failure yet before we were able to feed 
any input data to the program. 

In contrast, the experimental results with our new datalog approach look very promising, see Section|6l 
By the experience gained with these experiments, the following advantages of datalog compared with MSO 
became apparent: 

• Level of declarativity. MSO as a logic has the highest level of declarativity which often allows one 
very elegant and succinct problem specifications. However, MSO does not have an operational seman- 
tics. In order to turn an MSO specification into an algorithm, the standard approach is to transform the 
MSO evaluation problem into a tree language recognition problem. But the FTA clearly has a much 
lower level of declarativity and the intuition of the original problem is usually lost when an FTA is 
constructed. In contrast, the datalog program with its declarative style often reflects both the intuition 
of the original problem and of the algorithmic solution. This intuition can be exploited for defining 
heuristics which lead to problem-specific optimizations. 

• General optimizations. A lot of research has been devoted to generally applicable (i.e., not problem- 
specific) optimization techniques of datalog (see e.g. |4|). In our implementation (see Section |6]l, 
we make heavy use of these optimization techniques, which are not available in the MSO-to-FTA 
approach. 

• Flexibility. The generic transformation of MSO formulae to monadic datalog programs (given in 
Section m inevitably leads to programs of exponential size w.rt. the size of the MSO-formula and 
the treewidth. However, as our programs for 3-Colorability and PRIMALITY demonstrate, many 
relevant properties can be expressed by really short programs. Moreover, as we will see in Section|5] 
also datalog provides us with a certain level of succinctness. In fact, we will be able to express a big 
monadic datalog program by a small non-monadic program. 

• Required transformations. The problem of a "state explosion" reported in f2E\ already refers to the 
transformation of (relatively simple) MSO formulae on trees to an FTA. If we consider MSO on struc- 
tures with bounded treewidth the situation gets even worse, since the original (possibly simple) MSO 
formula over a finite structure first has to be transformed into an equivalent MSO formula over trees. 
This transformation (e.g., by the algorithm in [13il ) leads to a much more complex formula (in gene- 
ral, even with additional quantifier alternations) than the original formula. In contrast, our approach 
works with monadic datalog programs on finite structures which need no further transformation. Each 
program can be executed as it is. 

• Extending the programming language. One more aspect of the flexibility of datalog is the possibility 
to define new built-in predicates if they admit an efficient implementation by the interpreter An- 
other example of a useful language extension is the introduction of generalized quantifiers. For the 
theoretical background of this concept, see lfm[T2l . 

Some applications require a fast execution which cannot always be guaranteed by an interpreter. Hence, 
while we propose a logic programming approach, one can of course go one step further and implement our 
algorithms directly in Java, C-H-, etc. following the same paradigm. 

The paper is organized as follows. After recalling some basic notions and results in Section|2] we prove 
several results on the MSO-equivalence of substructures induced by subtrees of a tree decomposition in 
Section[3] In Section|4] it is shown that any MSO formula with one free individual variable over structures 
with bounded treewidth can be transformed into an equivalent monadic datalog program. In Section|5] we 
put monadic datalog to work by presenting new FPT algorithms for the 3-Colorability problem and for the 
PRIMALITY problem in case of bounded treewidth. In Section|6] we report on experimental results with a 
prototype implementation. A conclusion is given in Section]?] 
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2 Preliminaries 



2.1 Relational Schemas and Primality 

We briefly recall some basic notions and results from database design theory (for details, see f25l). In 
particular, we shall define the PRIMALITY problem, which will serve as a running example throughout this 
paper 

A relational schema is denoted as {R, F) where R is the set of attributes, and F the set of functional 
dependencies (FDs, for short) over R. W.l.o.g., we only consider FDs whose right-hand side consists of 
a single attribute. Let / e F with f:Y A. We refer to F C i? and A e R as lhs{f) and rhs{f), 
respectively. The intended meaning of an FD f:Y-^Ais that, in any valid database instance of {R, F), 
the value of the attribute A is uniquely determined by the value of the attributes in Y. It is convenient to 
denote a set {Ai, A2, . . . , An} of attributes as a string A1A2 . . . An- For instance, we write /: ab — > crather 
than /: {a, b} c. 

For any X C R,we write X+ to denote the closure of X, i.e., the set of all attributes determined by X. 
An attribute A is contained in X^ iff either A E X or there exists a "derivation sequence" of A from X in 
F of flie form X X U {Ai} ^ X U {Ai, ^2} ■ ■ ■ X U {Ai, An}, s.t. An = A and for every 
i e {1, . . . , n}, there exists an FD fi £ F with lhs{f) C X U {Ai, . . . , Ai_i} and rhs{f) = Ai. 

If X^ = R then X is called a superkey. If X is minimal with this property, then X is a key. An 
attribute A is called prime if it is contained in at least one key in (i?, F). An efficient algorithm for testing 
the primality of an attribute is crucial in database design since it is an indispensable prerequisite for testing 
if a schema is in third normal form. However, given a relational schema (i?, F) and an attribute A e i?, it 
is NP-complete to test if A is prime (cf. [25]). 

We shall consider two variants of the PRIMALITY problem in this paper (see Section l572l and l531 resp.): 
the decision problem (i.e, given a relational schema (i?, F) and an attribute A E R,\s A prime in (i?, F)l) 
and the enumeration problem (i.e, given a relational schema [R, F), compute all prime attributes in (i?, F)). 

Example 2.1 Consider the relational schema {R, F) with R = abcdeg and F — {fi: ab c, c ^ b, 
fr^: cd e, de — > g, fc,: g e}. It can be easily checked that there are two keys for the schema: abd 
and acd. Thus the attributes a, b, c and d are prime, while e and g are not prime. 

2.2 Finite Structures and Treewidth 

Let T — {i?i, . . . , Rk} be a set of predicate symbols. A finite structure A over r (a r-structure, for short) 
is given by a finite domain A — dom{A) and relations Rf C A", where a denotes the arity of Ri G t. A 
finite structure may also be given in the form {A, a) where, in addition to A, we have distinguished elements 
a — (ao, . . . , fltu) from dom{A). Such distinguished elements are required for interpreting formulae with 
free variables. 

A tree decomposition T of a r-structure A is defined as a pair (T, {At)teT) where T is a tree and each 
At is a subset of A with the following properties: (1) Every a £ A is contained in some At. (2) For every 
Ri G T and every tuple (ai, G R'f, there exists some node t £ T with {ai, C Aj. (3) For 

every a £ A, the set {t \ a E At} induces a subtree of T. 

The third condition is usually referred to as the connectedness condition. The sets At are called the bags 
(or blocks) of T. The width of a tree decomposition {T, {At)teT) is defined as max{| At | | t G T} — 1. The 
treewidth of A is the minimal width of all tree decompositions of A. It is denoted as tw{A). Note that trees 
and forests are precisely the structures with treewidth 1 . 

For given w; > 1, it can be decided in linear time if some structure has treewidth < w. Moreover, in 
case of a positive answer, a tree decomposition of width w can be computed in linear time, see [3J. 

In this paper, we assume that a relational schema {R, F) is given as a r-structure with r = {fd, att, Ih, 
rh}. The intended meaning of these predicates is as follows: fd{f) means that / is an FD and att{b) means 
that b is an attribute. lh{b, /) (resp. rh{b, /)) means that b occurs in lhs{f) (resp. in rhs{f)). The treewidth 
of {R, F) is then defined as the treewidth of this r-structure. 
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Example 2.2 Recall the relational schema {R, F) with R — abcdeg and F = {/i: ab c, f2'-c b, 
/a: cd — > e, /j: de — > g, /s: 5 — > e} from Example 12.11 This schema is represented as the following r- 
structure with T = {fd, att, Ih, rh}: A = (AJd'^, att-^, Ih-^, rh-^) with A = RJd-^ = {/i, /2, /a, A, /s}, 
att-^ = {a,6,c,d,e,<?}, Z/i-^ = {(a, /i), (6, /i), (c, /i), (c, /a), (d, /a), (d, /4), (e, A), (ff, /s)}, r/i"^ = 
{(c,/i),(&,/2),(e,/3),(5,/4),(e,/5). 

A tree decomposition T of this structure is given in Figure [T] Note that the maximal size of the bags in 
T is 3. Hence, the tree-width is < 2. On the other hand, it is easy to check that the tree-width of T cannot 
be smaller than 2: In order to see this, we consider the tuples in Ih'^ and r/i"^ as edges of an undirected 
graph. Then the edges corresponding to (6, /i), (c, J2) G Ih'^ and (6, /2), (c, /i) e r/i'^ form a cycle in this 
graph. However, as we have recalled above, only trees and forests have treewidth 1 . The tree decomposition 
in Figure [T]is, therefore, optimal and we have tw{F) — tw{A) — 2. 




Figure 1: Tree decomposition T of schema {R, F) in Example 12.1 



Remark. A relational schema {R,F) defines a hypergraph H{R,F) whose vertices are the attributes 
in R and whose hyperedges are the sets of attributes jointly occurring in at least one FD in F. Recall that 
the incidence graph of a hypergraph H contains as nodes the vertices and hyperedges of H. Moreover, two 
nodes v and h (corresponding to a vertex v and a hyperedge h in H) are connected in this graph iff (in the 
hypergraph H) v occurs in h. It can be easily verified that the treewidth of the above described r-structure 
and of the incidence graph of the hypergraph H{R, F) coincide. 

In this paper, we consider the following form of normalized tree decompositions, which is similar to the 
normal form introduced in Theorem 6.72 of lH: 

Definition 2.3 Let A be an arbitry structure with tree decomposition T of width w. We call T normalized 
if the conditions 1-4 are fulfilled: (1) The bags are considered as tuples ofw + 1 pairwise distinct elements 
(oq, . . . ,aw) rather than sets. (2) Every internal node t & T has either 1 or 2 child nodes. (3) If a node 
t with bag (ao, . . . , a„) has one child node, then the bag of the child is either obtained via a permutation 
of {qq, . . . , Qw) or by replacing ao with another element Aq. We call such a node t a permutation node or 
an element replacement node, respectively. (4) If a node t has two child nodes then these child nodes have 
identical bags as t. In this case, we call t a branch node. 

Proposition 2.4 Let A be an arbitry structure with tree decomposition T of width w. W.l.o.g., we may 
assume that the domain dom(A) has at least w + I elements. Then T can be transformed in linear time 
into a normalized tree decomposition T', s.t. T and T' have identical width. 

Proof. We can transform an arbitrary tree decomposition T into a normalized tree decomposition T' by the 
following steps (1) - (5). Clearly this transformation works in in linear time and preserves the width. 

(1) All bags can be padded to the "full" size of w + 1 elements by adding elements from a neighboring 
bag, e.g.: Let s and s' be adjacent nodes and let As have w + \ elements (in a tree decomposition of width 
w, at least one such node exists) and let \Asi \ = w' + 1 with w' < w. Then \As \ As' \ > (w — w') and we 
may simply add {w — w') elements from As \ As' to As' without violating the connectedness condition. 

(2) Suppose that some internal node s has k + 2 child nodes ti, . . . , tk+2 with fc > 0. It is a standard 
technique to turn this part of the tree into a binary tree by inserting copies of s into the tree, i.e., we introduce 
k nodes si, . . . , Sk with As- = As, s.t. the second child of s is si, the second child of si is S2, the second 
child of S2 is S3, etc. Moreover, ti remains the first child of s, while t2 becomes the first child of si, 
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becomes the first child of S2, . . . , ifc+i becomes the first child of Sk- Finally, tk+2 becomes the second child 
of Sk- Clearly, the connectedness condition is preserved by this construction. 

(3) If an internal node s has two children ti and <2, s.t. the bags of s, ti, and t2 are not identical, then 
we simply insert a copy si of s between s and ti and another copy S2 of s between s and t2- 

(4) Let s be the parent of s' and let \As \ As' \ = k with fc > 1. Then we can obviously "interpolate" 
s and s' by new nodes si, . . . , Sk~i, s.t. Sk-i is the new parent of s', Sk^2 is the parent of Sk-i, • • . , s is 
the parent of si. Moreover, the bags A^- can be defined in such a way that the bags of any two neighboring 
nodes differ in exactly one element, e.g. \As \ Ag-^ \ = \ As-^ \As \ = 1. 

(5) Let the bags of any two neighboring nodes s and s' differ by one element, i.e., 3a G As with a ^ As' 
and 3a' £ As' with a' ^ As- Then we can insert two "interpolation nodes" t and t', s.t. At has the same 
elements as Ag but with a at position 0. Likewise, Af has the same elements as As' but with a' at position 
0. □ 

Example 2.5 The tree decomposition T in Figure [T]is clearly not normalized. In contrast, tree decomposi- 
tion T' in Figure|2]is normalized in the above sense. Let us ignore the node identifiers si, . . . , S22 for the 
moment. Note the T and T' have identical width. 




slO \fl, b,c~\ \fl, b, c I sll 



sl2 \ f2, b, c 



b,fl, c I sl3 



a.fl, c sl4 



Figure 2: Normalized tree decomposition T' of schema {R, F) in Example l2.1 



2.3 Monadic Second Order Logic 

We assume some familiarity with Monadic Second Order logic (MSO), see e.g. f^'/S?!. MSO extends First 
Order logic (FO) by the use of set variables (usually denoted by upper case letters), which range over sets 
of domain elements. In contrast, the individual variables (which are usually denoted by lower case letters) 
range over single domain elements. An FO-formula over a r-structure has as atomic formulae either 
atoms with some predicate symbol from r or equality atoms. An MSO-formula ip over a r-structure may 
additionally have atoms whose predicate symbol is a monadic predicate variable. For the sake of readability, 
we denote such an atom usually as a G X rather than X{a). Likewise, we use set operators C and C with 
the obvious meaning. 

The quantifier depth of an MSO-formula ip is defined as the maximum degree of nesting of quantifiers 
(both for individual variables and set variables) in (p. In this work, we will mainly encounter MSO formulae 
with free individual variables. A formula (p{x) with exactly one free individual variable is called a unary 
query. More generally, let 1^9(2;) with x = {xq, . . . , x^,) for some w > be an MSO formula with free 
variables x. Furthermore, let ^ be a r-structure and a = {gq, . . . , a^) be distinguished domain elements. 
We write {A, a) \^ ipix) to denote that (p{a) evaluates to true in A. Usually, we refer to {A, a) simply as a 
"structure" rather than a "structure with distinguished domain elements". 
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Example 2.6 It was shown in y_8 | that primaHty can be expressed in MSO. We give a shghtly different 
MSO-formula (p{x) here, which is better suited for our purposes in Section|5] namely 

ip{x) = i3Y)[Y CRA Closed{Y) A a; ^ F A Closure{Y U {x}, R)] with 
Closed{Y) = {yf)[fd{f) ^ {3b)[{rhib, /) A G F) V {Ihib, /) A M Y)]] and 
Closure{Y, Z)=YCZA Closed{Z) A -^{3Z')[Y C Z' h Z' d Z A Closed{Z')]. 

This formula expresses the following characterization of primality: An attribute a is prime, iff there exists 
an attribute set 3^ C i?, s.t. y is closed w.rt. F (i.e., —y),a^y and {y U {a})+ = R. In other words, 
y U {a} is a superkey but 3^ is not. 

Recall the r-structure A from Example 12. 2| representing a relational schema. It can be easily verified 
that [A, a) Y= and {A, e) ^ ^{x) hold. 

We call two structures [A, a) and (S, b) k-equivalent and write a) =}f^'^ {B, b), iff for every MSO- 
formula tp of quantifier depth < k, the equivalence {A, a) ^ t/j <^ (;B, b) \^ ip holds. By definition, 
=k^^^ is an equivalence relation. For any k, the relation =^^^ has only finitely many equivalence classes. 
These equivalence classes are referred to as k-types or simply as types. The ^^"^ "^-equivalence between two 
structures can be effectively decided. There is a nice characterization of =^^'^-equivalence by Ehrenfeucht- 
Frai'sse games: The k- round MSO- game on two structures {A, d) and {B, b) is played between two players 
- the spoiler and the duplicator In each of the k rounds, the spoiler can choose between a point move and 
a set move. If, in the i-th round, he makes a point move, then he selects some element Ci E doin{A) or 
some element di £ dom{B). The duplicator answers by choosing an element in the opposite structure. If, 
in the i-th round, the spoiler makes a set move, then he selects a set Pi C do'm{A) or a set Qi C domiB). 
The duplicator answers by choosing a set of domain elements in the opposite structure. Suppose that, in 
k rounds, the domain elements ci, . . . , Cm and di, . . . , dm from dom,{A) and dom{B), respectively, were 
chosen in the point moves. Likewise, suppose that the subsets Pi, . . . , P„ and Qi, . . . , Qm of dom{A) and 
dom{B), respectively, were chosen in the set moves. The duplicator wins this game, if the mapping which 
maps each Ci to di is a partial isomorphism from [A^ a, Pi, ... , Pn) to {B, b, Qi, . . . , Qn)- We say that the 
duplicator has a winning strategy in the fc-round MSO-game on {A, d) and {B, b) if he can win the game for 
any possible moves of the spoiler 

The following relationship between ^^"^"^ -equivalence and fc-round MSO-games holds: Two structures 
{A, d) and (B, b) are k-equivalent iff the duplicator has a winning strategy in the k-round MSO-game on 
lA,d)and{B,b),seeM^- 



2.4 Datalog 

We assume some familiarity with datalog, see e.g. IT] |4] [30|. Syntactically, a datalog program P is a 
set of function-free Horn clauses. The (minimal-model) semantics can be defined as the least fixpoint of 
applying the immediate consequence operator Predicates occurring only in the body of rules in V are called 
extensional, while predicates occurring also in the head of some rule are called intensional. 

Let ^ be a r-structure with domain A and relations R-f, . . . , R-^ with Rf C A", where a denotes the 
arity of Ri £ r. In the context of datalog, it is convenient to think of the relations Rf as sets of ground 
atoms. The set of all such ground atoms of a structure A is referred to as the extensional database (EDB) of 
A, which we shall denote as £{A) (or simply as A, if no confusion is possible). We have Ri{d) G £{A) iff 
dERf. 

Evaluating a datalog program V over a structure A comes down to computing the least fixpoint of 
V U A. Concerning the complexity of datalog, we are mainly interested in the combined complexity (i.e., 
the complexity w.r.t. the size of the program V plus the size of the data A). In general, the combined 
complexity of datalog is EXPTIME-complete (implicit in 131]). However, there are some fragments which 
can be evaluated much more efficiently. ( 1 ) Propositional datalog (i.e., all rules are ground) can be evaluated 
in linear time (combined complexity), see LL^U- (2) The guarded fragment of datalog (i.e., every rule 
r contains an extensional atom B in the body, s.t. all variables occurring in r also occur in B) can be 
evaluated in time OdP] * \A\). (3) Monadic datalog (i.e., all intensional predicates are unary) is NP- 
complete (combined complexity), see |,16J . 
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3 Induced substructures 



In this section, we study the fc-types of substructures induced by certain subtrees of a tree decomposition (see 
Definitions 13 . 1 1 and 13.21) . Moreover, it is convenient to introduce some additional notation in Definition 13. 41 
below. 



Definition 3.1 Let T be a tree and t a node in T. Then we denote the subtree rooted at t as Tf. Moreover, 
analogously to i l2&l/ . we write Tt to denote the envelope of Tt. This envelope is obtained by removing all of 
Tt from T except for the node t. 

Likewise, let T — (T, {Ag)g^j') be a tree decomposition of a finite structure. Then we define 7^ — 

{Tt, {As)seTt) and ft = {ft, {As)s<zf,)- 

In other words, t is the root node in Tt while, in Tt, it is a leaf node. Clearly, the only node occurring in 
both Tt and ft is t. 

Definition 3.2 Let A be a finite structure and let T = (T, {At)teT) be a tree decomposition of A. Moreover, 
let s be a node in T with bag Ag = a = (oq, ...,«!„) and let S be one of the subtrees TgOrf ofT. 

Then we write T{A, S, s) to denote the structure {A' , a), where A! is the substructure of A induced by 
the elements occurring in the bags ofS . 

Example 3.3 Recall the relational schema [R, F) represented by the structure A from Example 12 . 21 with 
(normalized) tree decomposition T' in Figure |2] Consider, for instance, the node s in T' , as depicted in 
Figure[3] with bag As — [b, c). Then the induced substructure I{A, 7^', s) is the substructure of A which 
is induced by the elements occurring in the bags of 7^', whereas X{A, fg, s) the substructure of A which is 
induced by the elements occurring in the bags of 7^'. 




Figure 3: Induced substructures 7^' and 7^' of the tree decomposition T w.rt. the node s. 



Definition 3.4 Let w > 1 be a natural number and let A and B be finite structures over some signature t. 
Moreover, let (cq, . . . , a„,) (resp. {bo, ... ,1^^)) be a tuple of pairwise distinct elements in A (resp. B). 

We call (flo, . . . , a^) and (bo, . . . , bw) equivalent and write (oq, . . . , Qw) = (bo, ■ ■ ■ , bw), \fffi'>' any 
predicate symbol R G t with arity a and for all tuples {ii, . . . , ia) G {0, . . . , w}"', the equivalence 
R-^{ai^, . . . ,ai^) ^ R'^{bi^, . . . ,bi^) holds. 



8 



We are now ready to generalize results from |,28|| (dealing with trees plus a distinguished node) to the 
case of finite structures of bounded treewidth over an arbitrary signature r. In the three lemmas below, let 
fc > and w > 1 be arbitrary natural numbers and let r be an arbitrary signature. 

Lemma 3.5 Let A and B be r-stmctures, let S ( resp. T) be a normalized tree decomposition of A { resp. of 
B) of width w, and let s ( resp. t) be an internal node in S ( resp. in T). 

(1) permutation nodes. Let s' (resp. t') be the only child of s in S (resp. oft in T). Moreover, let a, a', b, 
and b' denote the bags at the nodes s, s', t, and t' , respectively. 

IfT{A, Ss' , s') =^^'^ ^{B, Tf , t') and there exists a permutation it, s.t. a — 7r(a') and b = 7r(6') 
then I{A, Ss , s) =f I{B, %, t). 

(2) element replacement nodes. Let s' (resp. t') be the only child of s in S (resp. oft in T). Moreover, let 
a = (ao, ai, . . . , a^), a' = (ag, oi, . . . , a^), b = (6o, 6i, . . . , bw), and b' = (5q, 6i, . . . , 6^,) denote the 
bags at the nodes s, s', t, and t', respectively. 

IfI{A, Ss' , s') =f I{B, % ., t') anda = b then I{A, Ss, s) =f j-^^ 

(3) branch nodes. Let Si and S2 (resp. ti and t2) be the children of s in S (resp. oft in T). 

IfI{A, Ss, , si) =f I{B, and I{A, Ss, , S2) =f I{B, 

then I{A, Ss: s) =f t). 

Proof. 

(1) Let I{A, Ss' , s') =ff^'~' I{B, Tf , t'). Hence, there exists a winning strategy of the duplicator on these 
structures. Moreover, (ao, . . . , a„,) and [b^, . . . ,bw) are obtained from (oq, . . . , a^) resp. (feg, . . . , by 
identical permutations. Thus the duplicator's winning strategy on the structures I{A: Ss' , s') and X(;B, %', 
t') is also a winning strategy on I{A, Ss , s) and I{B, %,t). 

(2) Let I{A, Ss' , s') =^^'^ ^{B, %' ,t'). Hence, there exists a winning strategy of the duplicator on these 
structures. The duplicator extends this strategy to the structures I{A, Ss, s) and T{B, Tt,t) in the following 
way. (We only consider moves of the spoiler in 2{A, Ss, s). Moves in 2{B, %, t) are treated analogously.) 
Any point or set move which is entirely in I{A, Ss' , s') is answered according to the winning strategy on 
the substructures X{A,Ss', s') and I{B,Tt' ,t'). For moves involving ap, we proceed as follows. If the 
duplicator picks ao in a point move, then the duplicator answers with &o- Likewise, if the spoiler makes 
a set move of the form P U {ao}, where P is a subset of the elements in T{A, Ss' , s') then the duplicator 
answers with Q U {bo}, where Q is the duplicator's answer to P in the game played on the substructures 
IiA,Ss',s') and J{B,Tt',t'). 

Let ci , . . . , Cm and di, . . . , d,,n be the elements selected in point moves and Pi , . . . , P„ and Qi, . . . , Qn 
be the sets selected in set moves. By the above definition of the duplicator's strategy, every move involving 
ao is answered by the analogous move involving 60 • For all other elements, the selected elements clearly 
define a partial isomorphism on the structures I{A, Ss' , s') and I{B, %' , t') extended by the selected sets. 
It remains to verify that the selected elements also define a partial isomorphism on the structures I{A, Ss , s) 
and 2{B, %, t) extended by the selected sets. In particular, we have to verify that all relations R E t are 
preserved by the selected elements. For any tuples of elements not involving ao (resp. 60)^ this is guaran- 
teed by the fact that the winning strategy on 2{A, Ss' , s') and 2{B, %' , t') is taken over to the structures 
I{A, Ss, s) and T{B, Tt,t). On the other hand, by the connectedness condition of tree decompositions, we 
can be sure that the only relations on 2{A, Ss , s) (resp. 2{B, %, t)) involving oq (resp. &o) are with elements 
in the bag (ao, . . . , a^) (resp. (60, ■ • ■ , b^)). But then, by the equivalence (ao, . . . , a„) = (&o, • ■ ■ , bw), the 
preservation of P e r is again guaranteed. 

(3) By the definition of branch nodes, the three nodes s, si, S2 have identical bags, say (ao, . . . , a^,). In 
particular, since the bag of s introduces no new elements, all elements contained in I{A,Ss, s) are either 
contained in T{A, Ss, ,si) or in I{A, Ss, , 52)- Moreover, by the connectedness condition, only the ele- 
ments ao, . . . , a^ occur in both substructures. Of course, the analogous observation holds for t, ti,t2, and 
IiB,%,t). 
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By assumption, 2:(A'5,i,si) =f^o x(^B,Tt^,t2) and I{A,Ss^, S2) =f^° 7^^ , ta)- We define 
the duplicator's strategy on T{A, Sg, s) and X{B,Tt,t) by simply combining the winning strategies on the 
substructures in the obvious way (Again we only consider moves of the spoiler in T{A, Sg, s)), i.e., if the 
spoiler picks some element c of Ss, s) then the chosen element c is inX(^, Ss^ , Si) for some i e {1,2}. 
Hence, the duplicator simply answers according to his winning strategy in the game on X{A, Ss^ , Si) and 
T{B, Tt- , ti). On the other hand, suppose that the spoiler picks a set P. Then P is of the form P = Pi U P2, 
where Pi contains only elements in 2{A, Sg^ , Si). Thus, the duplicator simply answers with Q = Qi U Q2, 
where Qi is the answer to Pi according to the winning strategy in the game on Sg^ , Si) andX{B, Tt-,ti). 

It remains to verify that the selected vertices indeed define a partial isomorphism on the structures 
T{A, Sg, s) and T{B, Tt,t) extended by the selected sets. Again, the only interesting point is that every 
relation i? £ r is preserved by the elements selected in the point moves. If all elements in a tuple c 
(resp. d) come from the same substructure T{A,Ss-,Si) (resp. 2{B,%.,ti)), then this is clearly fulfilled 
due to the fact that the duplicator's winning strategy on the substructures I{A, Sg^ , Si) and T{B, %■ , U) is 
taken over unchanged to the game on T{A, Sg, s) and T{B, Tf ,t). On the other hand, by the connectedness 
condition, we can be sure that the only relations between elements from different substructures T{A, Sg-^,si) 
and T{A, Sg^ , §2) (resp. 2{B, %^,ti) and I{B, %^ , ^2)) are with elements in the bag (cq, . . . , a^) (resp. 
{bo, . . . ,bw)) of si, S2, and s (resp. ti, t2, and t). But then, by the equivalences X{A,Sg-^,si) =jf^^ 
I{B, Tt^,ti) and I{A, Sg^ , S2) ^jf^'^' I{B, Tt^ , ^2), the preservation of i? G r is again guaranteed. □ 

Lemma 3.6 Let A and B be r-structures, let S ( resp. T) be a normalized tree decomposition of A ( resp. of 
B) of width w, and let s ( resp. t) be an internal node in S ( resp. in T). 

(1) permutation nodes. Let s' (resp. t') be the only child of s in S (resp. oft in T). Moreover, let a, a', b, 
and b' denote the bags at the nodes s, s', t, and t', respectively. 

IfI{A, Sg , s) =jf^'^ I{B, %, t) and there exists a permutation tt, s.t. a = 7r(a') and b — 7r(&') 
then I{A, Sg> , s') =f f^, ^ ^/)_ 

(2) element replacement nodes. Let s' (resp. t') be the only child of s in S (resp. oft in T). Moreover, let 
a = (ao, ai, . . . , Uw), a' — (ag, ai, . . . , a^), b — (bo, bi, . . . , byj), and b' — (6q, 61, ... , bw) denote the 
bags at the nodes s, s', t, and t', respectively. 

IfI{A,Sg,s) EEf^o AB,%,t) anda' = b' thenI{A,Sg>,s') -f^^ I{B,% ,t'). 

(3) branch nodes. Let si and S2 (resp. ti and t2) be the children of s in S (resp. oft in T). 

IfI{A,Sg,s) =fSO X(B,ft,t) andI{A,Sg„S2) =f^° I{B, %„t2) then 
I{A,Sg„si)=^^o x{B,ft„t^). 

IfI{A,Sg,s) =f^o x{B,ft,t)andI{A,Sg„si) =f^o I{B, Tt„ti) then 
I{A,Sg„S2) I{B,%,t2). 

Proof. The proof is by Ehrenfeucht-Fraisse games, analogously to Lemma [33] □ 

Lemma 3.7 Let A and B be r-structures, let S (resp. T) be a normalized tree decomposition of A (resp. 
ofB) of width w, and let s ( resp. t) be an arbitrary node in S ( resp. in T), whose bag is (ao, . . . ,aw) ( resp. 
{bo, . . .,b^,)). 

IfI{A,Sg,s) EEf^O AB,%,t) andX{A,Sg,s) =f^O I{B, ft,t) then {A,ai) =f^° {B,h) for every 
i e {0,...,ii;}. 

Proof. Again, the proof is by Ehrenfeucht-Fraisse games, analogously to Lemma l33] □ 

Discussion. Lemma [33] provides the intuition how to determine the fc-type of the substructure induced by a 
subtree Sg via a bottom-up traversal of the tree decomposition S. The three cases in the lemma refer to the 
three kinds of nodes which the root node s of this subtree can have. The essence of the lemma is that the 
type of the structure induced by Sg is fully determined by the type of the structure induced by the subtree 
rooted at the child node(s) plus the relations between elements in the bag at node s. Of course, this is no 
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big surprise. Analogously, Lemma ITSl deals with the fc-type of the substructure induced by a subtree Ss, 
which can be obtained via a top-down traversal of S. Finally, Lemma 13.71 shows how the /c-type of the 
substructures induced by Ss and Sg fully determines the type of the entire structure A extended by some 
domain element from the bag of s. 

4 Monadic Datalog 

In this section, we introduce two restricted fragments of datalog, namely monadic datalog over finite struc- 
tures with bounded treewidth and the quasi-guarded fragment of datalog. Let t = {Ri, ■ ■ ■ , Rk} be a set 
of predicate symbols and let w > 1 denote the treewidth. Then we define the following extended signature 

Ttd- 

Ttd = T U {rooi, tea/, childly child2, bag} 

where the unary predicates root, and leaf as well as the binary predicates child i and child2 are used to 
represent the tree T of the normalized tree decomposition in the obvious way. For instance, we write 
childi{si, s) to denote that si is either the first child or the only child of s. Finally, bag has arity w + 2, 
where bag{t, gq, . . . , a^) means that the bag at node t is (aq, . . . , a^). 

Definition 4.1 Let t be a set of predicate symbols and let w > 1. A monadic datalog program over r- 
structures with treewidth w is a set of datalog rules where all extensional predicates are from Ttd ond all 
intensional predicates are unary. 

For any r-structure A with normalized tree decomposition T = (T, {At)teT) of width w, we denote by 
Atd the Ttrf-structure representing A plus T as follows: The domain of Atd is the union of dom{A) and 
the nodes of T. In addition to the relations Rf^ with Ri e r, the structure Atd also contains relations for 
each predicate root, leaf, childi, child2, and bag thus representing the tree decomposition T. By l^l, one 
can compute Atd from A in linear time w.r.t. the size of A. Hence, the size of Atd (for some reasonable 
encoding, see e.g. lfT3l ) is also linearly bounded by the size of A. 

Example 4.2 Recall the relational schema {R, F) represented by the structure A from Example 12 . 2 1 with 
normalized tree decomposition T' in Figure |2] The domain of Atd is the union of dom{A) and the tree 
nodes {si, . . . , S22}. The corresponding Ttd structure Atd representing the relational schema plus tree de- 
composition T' is made up by the following set of ground atoms: root{si), leaf{si2), leafisi^), leaf{sig), 
childi{s2, si), child2{s3, si), . . ., hag{si, /a, d, e), . . .. 

As we recalled in Section l24l the evaluation of monadic datalog is NP-complete (combined complexity). 
However, the target of our transformation from MSO to datalog will be a further restricted fragment of 
datalog, which we refer to as "quasi-guarded". The evaluation of this fragment can be easily shown to be 
tractable. 

Definition 4.3 Let B be an atom and y a variable in some rule r. We call y "functionally dependent" on B 
if in every ground instantiation r' of r, the value of y is uniquely determined by the value of B. 

We call a datalog program V " quasi- guarded" if every rule r contains an extensional atom B, s.t. every 
variable occurring in r either occurs in B or is functionally dependent on B. 

Tfieorem 4.4 Let V be a quasi-guarded datalog program and let Abe a finite structure. Then V can be 
evaluated over A in time 0(|'P| * \ A}\), where \P\ denotes the size of the datalog program and \ A\ denotes 
the size of the data. 



11 



Proof. Let r be a rule in the program V and let B be the "quasi-guard" of r, i.e., all variables in r either 
occur in B or are functionally dependent on B. In order to compute all possible ground instances r' of r 
over A, we first instantiate B. The maximal number of such instantiations is clearly bounded by |^|. Since 
all other variables occurring in r are functionally dependent on the variables in B, in fact the number of all 
possible ground instantiations r' of r is bounded by |^|. 

Hence, in total, the ground program V' consisting of all possible ground instantiations of the rules in 
V has size 0(|7^| * |^|) and also the computation of these ground rules fits into the linear time bound. 
As we recalled in Section |24l the ground program V' can be evaluated over A in time 0(17-"! + |-4|) = 
0{m*\A\) + \A\) = 0{\r\*\A\). □ 

Before we state the main result concerning the expressive power of monadic datalog over structures 
with bounded treewidth, we introduce the following notation. In order to simplify the exposition below, 
we assume that all predicates Ri E t have the same arity r. First, this can be easily achieved by copying 
columns in relations with smaller arity. Moreover, it is easily seen that the results also hold without this 
restriction. 

It is convenient to use the following abbreviations. Let a = (ap , . . . , a^u ) be a tuple of domain elements. 
Then we write TZ{a) to denote the set of all ground atoms with predicates in t = {Ri, . . . , Rk} and 
arguments in {gq, . . . ,aw}, i.e., 

K w w 

^(«) = U U •■■ U 

Let ^ be a structure with tree decomposition T and let s be a node in T whose bag is a = (ap, ...,&„,). 
Then we write [A, s) as a short-hand for the structure [A, a) with distinguished constants a — (oq, . . . , a„,). 

Theorem 4.5 Let r and w > 1 be arbitrary but fixed. Every MSO-definable unary query over T-structures 
of treewidth w is also definable in the quasi-guarded fragment of monadic datalog over Ttd- 

Proof. Let tf{x) be an arbitrary MSO formula with free variable x and quantifier depth k. We have to 
construct a monadic datalog program V with distinguished predicate ip which defines the same query. 

W.l.o.g., we only consider the case of structures whose domain has > w + \ elements. We maintain two 
disjoint sets of fc-types and 0^, representing fc-types of structures {A, a) of the following form: A has a 
tree decomposition T of width w and a is the bag of some node s in T. Moreover, for 9^, we require that s 
is the root of S while, for 9^, we require that s is a leaf node of T. We maintain for each type a witness 
W{d) = {A, T, s). The types in 9^ and 9^ will serve as predicate names in the monadic datalog program 
to be constructed. Initially, 9^ = 9^=7' = 0. 

1 . "Bottom-up " construction ofQ^. 

Base Case. Let oq, . . . , a^, be pairwise distinct elements and let 5 be a tree decomposition consisting 
of a single node s, whose bag is As ~ (cq, . . . , a^)- Then we consider all possible structures {A, s) with 
this tree decomposition. In particular, dom{A) = {ao, . . . , a^j}. We get all possible structures with tree 
decomposition S by letting the EDB £{A) be any subset of TZ{a). For every such structure {A, s), we check 
if there exists a type e 9^ with Wi-d) = {B, T, t), s.t. (A, s) =jf^'^ {B, t). If such a d exists, we take 
it. Otherwise we invent a new token i9, add it to 9^ and set T/F(i9) := {A, S, s). In any case, we add the 
following rule to the program V: 

■d(v) ^ bag{v,xo, ■■■ ,Xiu), leaf {v),{Ri{xj^, ... ,Xj^) \ R{aj^, . . . ,aj^) £ SiA)}, 
{^R,{xj,,. . .,Xj^) I R{aj,,. . .,ajj ^ 

Induction step. We construct new structures by extending the tree decompositions of existing witnesses 
in "bottom-up" direction, i.e., by introducing a new root node. This root node may be one of three kinds of 
nodes. 
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(a) Permutation nodes. For each d' e 6^ let W{^') = {A,S',s') with bag Agi = (ao, . . . ,a^) at the 
root s' in <S'. Then we consider all possible triples {A, S, s), where <S is obtained from <S' by appending 
s' to a new root node s, s.t. s is a permutation node, i.e., there exists some permutation tt, s.t. Ag = 

(a^(o), . . . , a^(^)) 

For every such structure (A, s), we check if there exists a type ^? € 6^ with T4^(^?) = {B, T, t), s.t. 
{A, s) = jf^'^ {B, t). If such a -d exists, we take it. Otherwise we invent a new token i?, add it to and set 
W^d) := {A, S, s). In any case, we add the following rule to the program V: 

^ bag{v, a;^(o) , . . . , XttC^,)), child x{v' , v), '&'{v'), bag{v', xq, . . . , a;^). 

(b) Element replacement nodes. For each 1?' e 0^, letM^(i?') = (^',<S', s') with bag A^' = (oq, ai, . . . , a^) 
at the root s' in S'. Then we consider all possible triples {A, S, s), where S is obtained from S' by append- 
ing s' to a new root node s, s.t. s is an element replacement node. For the tree decomposition S, we thus 
invent some new element ao and set = (ao, ai, . . . , a^). For this tree decomposition S, we consider 
all possible structures A with dom{A) = dom{A') U {ao} where the EDB £{A') is extended to the EDB 
£{A) by new ground atoms from TZ{a), s.t. oq occurs as argument of all ground atoms in £{A) \ £{A'). 

For every such structure {A, s), we check if there exists a type ^ £ with W{^) = {B, T, t), s.t. 
{A, s) = jf^'~' {B, t). If such a 'd exists, we take it. Otherwise we invent a new token ■!?, add it to 9^^ and set 
W{'&) := {A, S, s). In any case, we add the following rule to the program V: 

d{v) ^ bag{v,Xo, xi,. . . ,Xw), child i{v' , v), ^'{v'), hag{v' ,x'q, xi, . . .,Xw), 
{Riixj^ ,...,Xj^)\ R{aj, , . . . , ajj e £(.A)}, 
{-^Rii^h ' • • • ' Xjr) I > • • • > «jJ ^ ^(-^)}- 

(c) Branch nodes. Let ■i?2 be two (not necessarily distinct) types in 9^ with W{'&i) = (,4i,<Si,si) 
and W{'d2) = (^2, '^2, S2}. Let = (ao, . . . , aw) and Ag^ = (60, . . . , bw), respectively. Moreover, let 
dom{Ai) n dom{A2) = 0. 

Let 5 be a renaming function with 5 = {ao <— bo,. . ■ ,aw <— bw}- By applying S to (.A2, <S2, S2), 
we obtain a new triple {A2,S2, S2) with A'2 = A2S and S2 = S2S. In particular, we thus have Ag^S = 
(ao, . . . , aw). Clearly, {A2, S2) =k^° (^2> •S2) holds. 

For every such pair (^1, 5i, si) and (.4.2,52, S2), we check if the EDBs are inconsistent, i.e., £{A\) fl 
7?.(a) ^ £{A'2) n 7<^(a). If this is the case, then we ignore this pair. Otherwise, we construct a new 
tree decomposition S with a new root node s, whose child nodes are si and S2. As the bag of s, we 
set As = As^ = Agi . By construction, 5 is a normaUzed tree decomposition of the structure A with 
dom{A) = dom{Ai)U dom{A'2) and EDB £{A) =£{Ai)U £(^2). 

As in the cases above, we have to check if there exists a type ^? G 9^ with W{^) = {B,T,t), s.t. 
{A, s) =fe^^^ (B, t). If such a "& exists, we take it. Otherwise we invent a new token add it to 9^ and set 
W{'&) := {A, S, s). In any case, we add the following rule to the program V: 

'd{v) •!— bag{v, xq,xi, . . .,Xw), childi{vi,v),'di{vi), child2{v2,v),'d2{v2), 
bag{vi,xo,xi, . . .,Xw), bag{v2,xo,xi, . . .,Xw). 

2. "Top-down" construction ofQK 

Base Case. Let ao, . . . , a^ be pairwise distinct elements and let <S be a tree decomposition consisting 

of a single node s, whose bag is As = (ao, . . . , a^). Then we consider all possible structures (.4, s) with 
this tree decomposition. In particular, dom{A) = {ao, . . . , aw}. We get all possible structures with tree 
decomposition <S by letting the EDB £{A) be any subset of TZ{a). For every such structure {A, s), we check 
if there exists a type d e with W^(i?) = {B, T, t), s.t. {A, s) =^^^ {B, t). If such a d exists, we take 
it. Otherwise we invent a new token -d, add it to 9^ and set W{§) := {A, S, s). In any case, we add the 
following rule to the program V: 
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i}{v) ^ bag{v,xo, ■ ■ .,Xw), root{v), {Ri{xj^,. . .,Xj^) \ R{aj^,. . .,aj^) £ S{A)}, 
{^Ri{xj, Xj^) I R{aj, ttjv) ^ £{A)}. 

Induction step. We construct new structures by extending the tree decompositions of existing witnesses 
in "top-down" direction, i.e., by introducing a new leaf node s and appending it as new child to a former 
leaf node s'. The node s' may thus become one of three kinds of nodes in a normaUzed tree decomposition. 

(a) Permutation nodes. For each -d' e OK let ^(t?') = {A,S',s') with bag A^/ = {aQ,...,a^) at 
some leaf node s' in S'. Then we consider all possible triples {A, S, s), where S is obtained from S' by 
appending s as a new child of s', s.t. s' is a permutation node, i.e., there exists some permutation tt, s.t. 

As = (0-71(0): ■ ■ ■ , air(tu)) 

For every such structure {A, s), we check if there exists a type ^ € with W{^) = {B, T, t), s.t. 
{A, s) =^^^ [B, t). If such a 1? exists, we take it. Otherwise we invent a new token 1?, add it to 9^ and set 
W^d) := {A, S, s) . In any case, we add the following rule to the program V: 

<- bag{v, a;^(o) , • • • , a;„(,„)), childi{v, v'), '&'{v'), bag{v', xq,..., Xyj). 

(b) Element replacement nodes. For each e O^^, let 14^(1?') = {A', S', s') with bag Agi = (ag, oi, . . ., 
a^u) at leaf node .s' in S'. Then we consider all possible triples {A, S, s), where S is obtained from S' by 
appending s as new child of s', s.t. s' is an element replacement node. For the tree decomposition S, we thus 
invent some new element ao and set = {ao,ai, . . . ,ayj). For this tree decomposition <S, we consider 
all possible structures A with doin{A) = dom{A') U {ao} where the EDB £{A') is extended to the EDB 
£{A) by new ground atoms from TZ{a), s.t. ao occurs as argument of all ground atoms in £{A) \ £{A'). 

For every such structure (.4., s), we check if there exists a type 1^ G with W{'&) = {B, T, t), s.t. 
{A, s) =^^'^ {B, t). If such a d exists, we take it. Otherwise we invent a new token I?, add it to and set 
W{"&) := {A, S, s). In any case, we add the following rule to the program V: 

'&{v) <— hag{v, Xq,Xi, . . . ,Xw), childi{v, f'), -(?'(«'), bag{v', x'(,,Xi, . . . ,Xw), 
{Ri{xj, ,...,Xj^)\ R{aj, ttjj e £{A)}, 
{^Ri{xj^ Xj^) I i?(oji , . . . , Ojv) ^ £{A)}. 

(c) Branch nodes. Let ^9 e 6^ and i?2 £ 6^ with W{^9) = {A, S, s) and W{^92) = {A2, ^2, sa). Note that 
s is a leaf in S while S2 is the root of Now let As = (oq, . . . , a^,) and As2 ~ {bo, . . . ,bu,), respectively, 
and let dom{A) n dom{A2) = 0. 

Let 5 be a renaming function with S = {ao ^ bo, ■ ■ ■ ,aw <— bw}- By applying 5 to (^2, >52, S2), 
we obtain a new triple (^2,^2, S2) with A2 = A2S and S2 = S2S. In particular, we thus have Ag^S = 
(ao, . . . , a^). Clearly, (^2, .s-2) =jf^^ (A, S2) holds. 

For every such pair {A,S,s) and {A2,S2,S2), we check if the EDBs are inconsistent, i.e., £{A) (1 
TZ{a) ^ £{A'2) n TZ{a). If this is the case, then we ignore this pair. Otherwise, we construct a new tree 
decomposition Si by introducing a new leaf node Si and appending both Si and S2 as child nodes of s. As 
the bag of Si, we set Ag^ = Ag = Ag'^. By construction. Si is a normalized tree decomposition of the 
structure ^1 with dom{Ai) = dom{A) U dom{A'2) and EDB £{Ai) = £{A) U £{A'2)- 

As in the cases above, we have to check if there exists a type i?i G with W{'&i) = {B, T, t), s.t. 
(^1, si) =^^'^ (B, t). If such a 'di exists, we take it. Otherwise we invent a new token i?!, add it to 0^ 
and set VF(z?i) := {Ai,Si,si) In any case, we add the following rule to the program P: 

•diivi) ^ bag{vi,Xo,Xi, . . .,x^), childi{vi,v), child2{v2,v),-&{v),i}2{v2), 
bag{v, Xo,Xi,. . .,x^), bag{v2,xo,Xi, . . . ,a;^). 

Now suppose that <Si is constructed from S and ^2 by attaching the new node si as second child of s and 
S2 as the first child. In this case, the structure Ai remains exactly the same as in the case above, since the 
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order of the child nodes of a node in the tree decomposition is irrelevant. Thus, whenever the above rule is 
added to the program V, then also the following rule is added: 

i9i(w2) ^ bag{v2,Xo,Xi, . . . ,Xyj), childi{vi,v), child2{v2,v),'d{v),i)2{vi), 
bag{v, Xo,Xi, . . .,x^), bag{vi,Xo,Xi, . . . ,2;^,). 

3. Element selection. 

We consider all pairs of types -di e 6^ and -02 G 6^-. Let Widi) = {Ai,Si,si) and W{'d2) = 
(-4.2,52, S2). Moreover, let Ag-^ — (ao, . . . , a^) and Ag^ — {bo, . . . , respectively, and let doTn{Ai) fl 
dom{A2) = 0. 

Let 5 be a renaming function with 6 — {ao ^ bo, . . . ,aw ^ b^j}. By applying S to {A2, S2, S2), 
we obtain a new triple (^21 '^2, S2) with A'2 = A2S and S2 = S2S. In particular, we thus have Ag^S — 
(ao, . . . , a„). Clearly, (^2, S2) =f (A, S2) holds. 

For every such pair {Ai,Si, si) and (^2? '^2; ^2), we check if the EDBs are inconsistent, i.e., £{Ai) fl 
TZ{a) 7^ £{A2) n Tl{d). If this is the case, then we ignore this pair Otherwise, we construct a new tree 
decomposition S by identifying si (= the root of Si) with S2 (= a leaf of ^2). By construction, 5 is a 
normalized tree decomposition of the structure A with dom{A) — dom{Ai) U dom{A'2) and £{A) — 
£{Ai)\J£{A'2). 

Now check for each a^ in Ag^ — Ag^S, if ^ ^ ¥'(ai). If this is the case, then we add the following rule 
to v. 

<p{xi) ^ di{v),"d2{v),bag{v,xo,---,x^). 

We claim that the program V with distinguished monadic predicate tp is the desired monadic datalog 
program, i.e., let A be an arbitrary input r-structure with tree decomposition S and let Atd denote the 
corresponding Ttrf-structure. Moreover, let a G dom{A). Then the following equivalence holds: A \= 
<p{a) iff f{a) is in the least fixpoint of 7^ U Atd- 

Note that the intensional predicates in Q\ 9^, and {1^} are layered in that we can first compute the least 
fixpoint of the predicates in G^^, then 8^, and finally ip. 

The bottom-up construction of 8^ guarantees that we indeed construct all possible types of structures 
{B, t) with tree decomposition T and root t. This can be easily shown by Lemma [331 and an induction on 
the size of the tree decomposition T. On the other hand, for every subtree Sg of S, the type of the induced 
substructure T{A, Sg,s) is-d for some i9 G 8^ if and only if the atom i?(s) is in the least fixpoint of U Atd- 
Again this can be shown by an easy induction argument using Lemma [33] 

Analogously, we may conclude via Lemma [3761 that 8^ contains all possible types of structures {B, t) 
with tree decomposition T and some leaf node t. Moreover, for every subtree Sg of S, the type of the 
induced substructure T{A, Sg, s) is d for some -i? G 8Mf and only if the atom 'd{s) is in the least fixpoint 
of P U Atd- The definition of the predicate (p in part 3 is a direct realization of Lemma [3771 It thus follows 
that A \= ip{a) iff (p{a) is in the least fixpoint ofVU Atd- 

Finally, an inspection of all datalog rules added to P by this construction shows that these rules are 
indeed quasi-guarded, i.e., they all contain an atom B with an extensional predicate, s.t. all other variables 
in this rule are functionally dependent on the variables in B. For instance, in the rule added to 8^ in case of 
a branch node, the atom bag{v, xq, - ■ - ,Xw) is the quasi-guard. Indeed, the remaining variables vi and V2 
in this rule are functionally dependent on u via the atoms childi{vi,v) and child2{v2,v). □ 

Above all. Theorem 14.51 is an expressivity result. However, it can of course be used to derive also a 
complexity result. Indeed, we can state a slightly extended version of Courcelle's Theorem as a corollary 
(which is in turn a special case of Theorem 4. 12 in 1T31 '). 

Corollary 4.6 The evaluation problem of unary MSO-queries ip{x) over r-structures A with treewidth w 
can be solved in tit7ie 0{f{\ip{x)\,w) * \A\) for sotne function f. 
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Proof. Suppose that we are given an MSO-query (p{x) and some treewidth w. By Theorem 14.51 we can 
construct an equivalent, quasi-guarded datalog program V. The whole construction is independent of the 
data. Hence, the time for this construction and the size of V are both bounded by some term f{\Lp{x)\, w). 
By US), a tree decomposition T of ^ and, therefore, also the extended structure Atd can be computed in 
time 0(1^1). Finally, by Theorem 14.41 the quasi-guarded program V can be evaluated over Atd in time 
0(17^1 * |.4tc(|), from which the desired overall time bound follows. □ 

Discussion. Clearly, Theorem l4.5l is not only applicable to MSO-definable unary queries but also to 0-ary 
queries, i.e., MSO-queries defining a decision problem. An inspection of the proof of Theorem l4.5l reveals 
that several simplifications are possible in this case. Above all, the whole "top-down" construction of 
can be omitted. Moreover, the rules with head predicate ip are now much simpler: Let iphe a 0-ary MSO- 
formula and let denote the set of types obtained by the "bottom-up" construction in the above proof. 
Then we define ej = {i? | W{d) = {A, S, s) and A h V'}- Finally, we add the following set of rules with 
head predicate (p to our datalog program: 

ip ^ root{v),da{v). 

for every i?o 6 6j. We shall make use of these simplifications in Section ISTI and |5^ when we present new 
algorithms for two decision problems. In contrast, these simplifications are no longer possible when we 
consider an enumeration problem in Section |53] In particular, the "top-down" construction will indeed be 
required then. 

5 Monadic Datalog at Work 

We now put monadic datalog to work by constructing several new algorithms. We start off with a simple 
example, namely the 3-Colorability problem, which will help to illustrate the basic ideas, see Section ISTTI 
Our ultimate goal is to tackle two more involved problems, namely the PRIMALITY decision problem 
and the PRIMALITY enumeration problem, see Sections [5.21 and |53] All these problems are well-known 
to be intractable. However, since they are expressible in MSO over appropriate structures, they are fixed- 
parameter tractable w.r.t. the treewidth. In this section, we show that these problems admit succinct and 
efficient solutions via datalog. 

Before we present our datalog programs, we slightly modify the notion of normalized tree decomposi- 
tions from Section 12.21 Recall that an element replacement node replaces exactly one element in the bag 
of the child node by a new element. For our algorithms, it is preferable to split this action into two steps, 
namely, an element removal node, which removes one domain element from the bag of its child node, and 
an element introduction node, which introduces one new element. Moreover, it is now preferable to con- 
sider the bags as sets of domain elements rather than as tuples. Hence, we may delete permutation nodes 
from the tree decomposition. Finally, we drop the condition that all bags in a tree decomposition of width 
w must have "full size" w + 1 (by splitting the element replacement into element removal and element 
introduction, this condition would have required some relaxation anyway). Such a normal form of tree de- 
compositions was also considered in [23 1. For instance, recall the tree decomposition T' from Figure|2] A 
tree decomposition T" compUant with our modified notion of normalized tree decompositions is depicted 
in Figure |4] 

5.1 The 3-Colorability Problem 

Suppose that a graph (V, E) with vertices V and edges E is given as a r-structure with r = {e}, i.e., e is the 
binary edge relation. This graph is 3-colorable, iff there exists a partition of V into three sets TZ, Q, B, s.t. 
no two adjacent vertices vi,V2 6 1^ are in the same set TZ, Q, or B. This criterion can be easily expressed 
by an MSO-sentence, namely 
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Figure 4: Modified normal form of tree decompositions. 



Program 3-Colorability 
/* leaf node. */ 

solve{s, R, G, B) <— leaf{s), bag{s, X), partition's, R, G, B), aUowed{s, R), aUowed{s, G), allowed{a, B). 
I* element introduction node. */ 

solve{s, Rht) {v},G, B) <— bag{s, X ttJ {u}), childi (si, s), bag{si, X), solve{si, R, G, B) aUowed{s, R^S {v}). 
solve{s, R, G htS {v} , B) ^ bag{s, X ktl {v}), child\{s\, s), bag{si, X), solve{si, R,G,B) allowed{s, G tbl {w}). 
solve{s, R,G, B kti {v}) «— bag{s, X ttJ {v}), childi{si, s), bag[si, X), solve{si, R, G, B) allowed{s, B td {v}). 
/* element removal node. */ 

solve{s, R, G, B) <— bag{s, X), child\(s\, s), bag{s\,X l+J {v}), solve[s\, R tU {v}, G, B). 
solve{s, R, G, B) <— bag{s, X), childi{si, s), bag{si,X tiJ {«}), solve{si, R, G W {v}, B). 
solve{s, R, G, B) <— bag{s, X), child\{si, s), bag{si,X l±l {v}), solve{si, R, G,B hti {v}). 
I* branch node. */ 

solve{s, R, G, B) <— bag{s, X), childi{si, s), child^is^, s), bag{si, X), bag{s2, X), solve{si, R, G, B), 

solve{s2, R, G, B). 
I* result (at the root node). */ 
success <— root(s), solve{s, R, G, B). 



Figure 5: 3-Colorability Test. 



(/5 = 3R3G3B[Partition{R, G, B) A VwiVw2[e(wi, 172) ^ 

{^R{vi) V ^R{v2)) A (-G(wi) V ^G{v2)) A {^B{vi) V ^B{v2)) with 
Partition{R, G, B) = \Jv[[R{v) V G{v) V B{v)] A 

{^R{v) V --G{v)) A {^R{v) V ^B{v)) A (-G(w) V 

Suppose that a graph (V, E) together with a tree decomposition T of width w is given as a rt^-structure 
with Ttd = {e, root, leaf, childi, child2, bag}. In Figure|5] we describe a datalog program which takes such 
a T((j-structure as input and decides if the graph thus represented is 3-colorable. 

Some words on the notation used in this program are in order: We are using lower case letters s and v 
(possibly with subscripts) as datalog variables for a single node in T and for a single vertex in V, respec- 
tively. In contrast, upper case letters X, R, G, and B are used as datalog variables denoting sets of vertices. 
Note that these sets are not sets in the general sense, since their cardinality is restricted by the size w + 1 of 
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the bags, where w is a fixed constant. Hence, these "fixed-size" sets can be simply implemented by means of 
fc-tuples with k < (w + 1) over {0, 1}. For the sake of readability, we are using non-datalog expressions with 
the set operator tt) (disjoint union). For the fixed-size sets under consideration here, one could, of course, 
easily replace this operator by "proper" datalog expressions of the form disjoint _union{R, {v}, R'). 

It is convenient to introduce the following notation. Let G — (V, E) be the input graph with tree 
decomposition T. For any node s in T, we write as usual Tg to denote the subtree of T rooted at s. 
Moreover, we write V{s) and V{Ts) to denote the vertices in the bag of s respectively in any bag in Tg. 

Our 3-Colorability-program checks if G is 3-colorable via the criterion mentioned above, i.e., there 
exists a partition of V into three sets TZ, Q, B, s.t. no two adjacent vertices vi,V2 ^ V are in the same set TZ, 
g, 01 B. 

At the heart of this program is the intensional predicate solve{s, R, G, B) with the following intended 
meaning: s denotes a node in T and R, G, B are the projections of TZ, Q, B onto V{s). For all values 
s, R, G, B, the ground fact solve{s, R, G, B, ) shall be in the least fixpoint of the program plus the input 
structure, iff the following condition holds: 

Property A. There exist extensions Rof R,G of G, and S of S to V{Ts), s.t. 

1. R, G, and B form a partition of V{Ts) and 

2. no two adjacent vertices vi,V2 G V{%) are in the same set R, G, or B. 

In other words, R, G, and B is a valid 3-coloring of the vertices in V{Ts) and R, G, and B are the projections 
of ^, G, and i3 onto T/(s). 

The main task of the program is the computation of all facts solve{s, R, G, B) via a bottom-up traversal 
of the tree decomposition. The other predicates have the following meaning: 

• partition{s, R, G, B) is in the least fixpoint iff R, G, B is a partition of the bag X at node s in the 
tree decomposition. 

• allowed {s, X) is in the least fixpoint iff X contains no adjacent vertices f i, t'2. 

Recall that the cardinality of the sets X, R, G, B occurring as arguments of partition and allowed is 
bounded by the fixed constant w + 1. In fact, both the partition predicate and the allowed predicate can be 
treated as extensional predicates by computing all facts partition{s, R, G, B) and allowed{s, X) for each 
node s in T as part of the computation of the tree decomposition. This additional computation also fits into 
the linear time bound. 

The intuition of the rules with the io/ve-predicate in the head is now clear: At the leaf nodes, the program 
generates ground facts solve{s, R, G, B) for all possible partitions of the bag X at s, such that none of the 
sets R, G, B contains two adjacent vertices. The three rules for element introduction nodes distinguish the 
three cases if the new vertex v is added to R, G, or B, respectively. Of course, by the allow ed-atova in the 
body of these 3 rules, the attempt to add v to any of the sets R, G, or B may fail. The three rules for element 
removal nodes distinguish the three cases if the removed vertex was in R, G, or B, respectively. The rule 
for branch nodes combines ioZve-facts with identical values of (i?, G, B) at the child nodes si and S2 to the 
corresponding solve-faci at s. 

In summary, the 3-colorability-programhas the following properties. 

Theorem 5.1 The datalog program in Figure\5\decides tlie 3-Colorability problem, i.e., the fact "success" 
is in the least fixpoint of this program plus the input TtdStructure Atd iff Atd encodes a 3-colorable graph 
(V, E). Moreover, for any graph (V, E) with treewidth w, the computation of the Ttd-structure Atd <^nd the 
evaluation of the program can be done in time 0{f{w) * \{V, E)\) for some function f. 

Proof. By the above considerations, it is clear that the predicate solve indeed has the meaning described 
by Property A. A formal proof of this fact by structural induction on T is immediate and therefore omitted 
here. Then the rule with head success reads as follows: success is in the least fixpoint, iff s denotes the root 
of T and there exist extensions R, G, and B of _R, G, B to V{Ts) (which is identical to V in case of the root 
node s), s.t. R, G, and B is a valid 3-coloring of the vertices in V{Ts) — V. 
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For the linear time data complexity, the crucial observation is that our program in Figure |5] is essen- 
tially a succinct representation of a quasi-guarded monadic datalog program. For instance, in the atom 
solve{s, R, G, B), the sets i?, G, B are subsets of the bag of s. Hence, each combination i?, G, B could be 
represented by 3 subsets ri, r2, ra over {0, . . . , w} referring to indices of elements in the bag of s. Recall 
that u; is a fixed constant. Hence, solve{s, R, G, B) is simply a succinct representation of constantly many 
monadic predicates of the form solve (^ri,r2.r3) (s)- The quasi-guard in each rule can thus be any atom with 
argument s, e.g., bag{s, X) (possibly extended by a disjoint union with {v}). Thus, the linear time bound 
follows immediately from Theorem |4.4| □ 

Discussion. Let us briefly compare the monadic program constructed in the proof of Theorem 14.51 with 
the 3-Colorability program in Figure |5] Actually, since we are dealing with a decision problem here, we 
only look at the bottom-up construction in the proof of Theorem |4.5l since the top-down construction is not 
needed for a 0-ary target formula ip{). As was akeady mentioned in the proof of Theorem 15. II the atoms 
solve{s, R, G, B) can be thought of as a succinct representation for atoms of the form solvei^j.^_^^ j..^^{s). 
Now the question naturally arises where the type d of some node s from the proof of Theorem l4 5l is present 
in the 3-Colorability program. A first tentative answer is that this type essentially corresponds to the set 
-^(■s) = {(j"!, ?'2, J'a) I solve (^ri,r2,r3) (s) is in the least fixpoint}. However, there are two significant aspects 
which distinguish our 3-Colorability program from merely a succinct representation of the type transitions 
encoded in the monadic datalog program of Theorem l4.5l 

1 . By Property A, we are only interested in the types of those structures which - in principle - could be 
extended in bottom-up direction to a structure representing a satisfiable propositional formula. Hence, 
in contrast to the construction in the proof of Theorem l4.5l our 3-Colorability program does clearly 
not keep track of all possible types that the substructure induced by some tree decomposition Tg may 
possibly have. 

2. R{s) = {{ri,r2,r3) \ solve (^ri,r2,r3){s) is in the least fixpoint} does not exactly correspond to the 
type of s. Instead, it only describes the crucial properties of the type. Thus, the 3-ColorabiUty program 
somehow "aggregates" several types from the proof of Theorem l4.5l 

These two properties ensure that the 3-Colorability program is much shorter than the program in the proof 
of Theorem 14. 5 1 and that the difference between these two programs is not just due to the succinct repre- 
sentation of a monadic program by a non-monadic one. The deeper reason of this improvement is that we 
take the target MSO formula ip (namely, the characterization of 3-Colorability) into account for the entire 
construction of the datalog program in Figure |5] In contrast, the rules describing the type-transitions in the 
proof of Theorem l4.5l for a bottom-up traversal of the tree decomposition are fully generic. Only the rules 
with head predicate ip are specific to the actual target MSO formula ip. 

5.2 The Primality Decision Problem 

Recall from Section l2!2] that we represent a relational schema {R, F) as a r-structure with r = {fd, att, Ih, 
rh}. Moreover, recall that, in Section|5] we consider normalized tree decompositions with element removal 
nodes and element introduction nodes rather than element replacement nodes as in Section l272l With our rep- 
resentation of relational schemas (i?, F) as finite structures, the domain elements are the attributes and FDs 
in (i?, F). Hence, in total, the former element replacement nodes give rise to four kinds of nodes, namely, 
attribute removal nodes, FD removal nodes, attribute introduction nodes, and FD introduction nodes. More- 
over, we now consider the bags as a pair of sets [At, Fd), where At is a set attributes and Fd is a set of 
FDs. Again, we may delete permutation nodes from the tree decomposition. Finally, it will greatly simplify 
the presentation of our datalog program if we require that, whenever an FD f ^ F is contained in a bag of 
the tree decomposition, then the attribute rhs{f) is as well. In the worst-case, this may double the width of 
the resulting decomposition. 

Suppose that a schema (i?, F) together with a tree decomposition T of width w is given as a Ttd- 
structure with Ttd = {fd, att, Ih, rh, root, leaf , childi, child2, hag}. In Figure |6] we describe a datalog 
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program, where the input is given as an attribute a ^ R and a rt^-structure, s.t. a occurs in the bag at the 
root of the tree decomposition. 



Program PRIMALITY 

/* leaf node. */ 

solve{s, Y, FY, C°, AC, FC) ^ leaf{s), bag{s. At, Fd), Y \JC° ^ At,Y nC° = outside{FY, Y, At, Fd), 

FC C Fd, consistent {FC, C°), AC = {rhs{f) \ f £ FC}, AC C C°. 
/* attribute introduction node. */ 

solve{s,Yii){b},FY,C'',AC,FC) ^ bag{s. At H) {b}, Fd), chtldi{si, s), bag{si. At, Fd), 

solve{si,Y, FY, C° , AC, FC). 
solve{s,Y,FY,C°\*i{b},AC,FC) ^ bag(s. At W {b}, Fd), childi{si, s), bag{si. At, Fd), 

consistent {FC , C° W {6}), solve{si,Y, FYi, C° , AC, FC), outside{FY2, Y, At, Fd), FY = FYi U FY2. 
I* FD introduction node. */ 

solve{s, Y, FY, C° , AC, FC) ^ bag{s. At, Fd a {/}), childi{si, s), bag{si,At, Fd), rh{b, f), b€Y, 

solve{si,Y, FY, C° , AC, FC). 
solve{s,Y,FY,C° , AC Mi {b},FC ^ HY) ^ bag{s. At, Fd \S {f}), childi{si, s), bag{si. At, Fd), rh{b,f), 

b e C°, solve{si,Y, FYi,C°, AC, FC), consistent{{f}, C°), outside{FY2, Y, At, {/}), FY = FYi U FY2. 
solve{s, Y, FY, C°, AC, FC) ^ bag{s. At, Fd U {/}), childi{si, s), bag{si,At, Fd), rh{b, f), b G C°, 

solve{si,Y,FYi,C°,AC,FC), outside{FY2,Y, At, {f}), FY = FY1UFY2. 
I* attribute removal node. */ 

solve{s,Y,FY,C°,AC, FC) ^ bag{s. At, Fd), childi{si, s), bag{si. At i±) {b}, Fd), 

solve{si,Y i+i {b}, FY, C° , AC, FC). 
solve{s, Y, FY, C° , AC, FC) ^ bag{s. At, Fd), childi{si, s), bag{si,At tU {&}, Fd), 

solve{si,Y, FY, C° W {&}, AC U {b}, FC). 
I* FD removal node. */ 

solve{s, Y, FY, C° , AC, FC) ^ bag{s. At, Fd), childi{si, s), bag{si,At, Fd 1+) {/}), rh{b, f), b€Y, 

solve{si,Y, FY, C° , AC, FC). 
solve{s, Y, FY, C° , AC, FC) ^ bag{s. At, Fd), childi{si, s), bag{si,At, Fd 1+) {/}), rh{b, f), b G C°, 

solve{si,Y, FY tti {/}, C°, AC, FC tbi {/}). 
solve{s, Y, FY, C° , AC, FC) ^ bag{s. At, Fd), childi{si, s), bag{si,At, Fd 1+1 {/}), rh{b, f), b G C°, 

solve{si,Y, FY Mi {f},C'' , AC, FC), f i FC . 
I* branch node. */ 

solve{s,Y,FYxVJ FY2,C° ,ACx UAC2,FC) bag{s. At, Fd), childi{si,s), bag{si. At, Fd), 

child2(s2, s), bag{s2, At, Fd), solve{si, Y, FYi, C, ACi,FC), 

solve{s2, Y, FY2, C°, AC2, FC), umque{ACi, AC2, FC). 
I* result (at the root node). */ 

success ^ root{s), bag{s. At, Fd), a G At, solve{s, Y, FY, C°, AC, FC), a^Y, 
FY = {/ G Fd I rhs{f) Y}, AC = C° \ {a}. 



Figure 6: Primality Test. 



Analogously to Section ISTTl we are using lower case letters s, /, and h (possibly with subscripts) as 
datalog variables for a single node in T, for a single FD, or for a single attribute in R, respectively. Upper 
case letters are used as datalog variables denoting sets of attributes (in the case of Y, At, C° , AC) or sets 
of FDs (in the case of Fd, FY, FC). In addition, C° is considered as an ordered set (indicated by the 
superscript o). When we write C° tt) {b}, we mean that b is arbitrarily "inserted" into C°, leaving the order 
of the remaining elements unchanged. Again, the cardinality of these (ordered) sets is restricted by the size 
w + 1 of the bags, where it; is a fixed constant. In addition to tt) (disjoint union) we are now also using the set 
operators U, n, C, and G. For the fixed-size (ordered) sets under consideration here, one could, of course, 
easily replace these operators by "proper" datalog expressions. Moreover, for the input schema {R, F) with 
tree decomposition T we use the following notation: We write FD{s) to denote the FDs in the bag of s 
and FD{Ts) to denote the FDs that occur in any bag in Tg. Analogously, we write Att{s) and Att{Ts) as a 
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short-hand for the attributes occurring in the bag of s respectively in any bag in Tg. 

Our PRlMALlTY-program checks the primality of a by via the criterion used for the MSO-characteri- 
zation in Example |2.6l We have to search for an attribute set y ^ R, s.t. y is closed w.r.t. F (i.e., y+ = y), 
a ^ y and {y U {a})+ = R, i.e., y U {a} is a superkey but y is not. 

At the heart of our PRIMALlTY-program is the intensional predicate solve{s, Y, FY, C° , AC, FC) with 
the following intended meaning: s denotes a node in T. Y (resp. C°) is the projection of 3^ (resp. of i? \ 3^) 
onto Att{s). We consider i? \ ^ as ordered w.rt. an appropriate derivation sequence of R from y U {a}, 
i.e., suppose that y U {Aq] y \J {Aq, Ai} ^y\J {Aq, ^i, ^2} ^ • • • ^ 3^ U {Aq, Ai, . . . , A„}, s.t. 
Aq = a and y U {Aq, Ai, . . . , An] = R. W.l.o.g., the A^'s may be assumed to be pairwise distinct. Then 
for any two i ^ j, we simply set Ai < Aj iff i < j. By the connectedness condition on T, our datalog 
program ensures that the order on each subset C° of i? \ ^ is consistent with the overall ordering. 

The argument FY of the io/ve-predicate is used to guarantee that y is indeed closed. Informally, FY 
contains those FDs in FD{s) for which we have already verified (on the bottom-up traversal of the tree 
decomposition) that they do not constitute a contradiction with the closedness of y. In other words, either 
rhs{f) ^ y 01 there exists an attribute in lhs{f) n At{Ts) which is not in y. 

The arguments AC and FC of the io/ve-predicate are used to ensure that (3^ U {a})+ — R indeed holds: 
The intended meaning of the set FC is that it contains those FDs in FD{s) which are used in the above 
derivation sequence. Moreover, AC contains those attributes from Att{s) for which we have already shown 
that they can be derived from y plus smaller atoms in C°. 

More precisely, for all values s, Y, FY, C°, AC, FC, the ground fact solve{s, Y, FY, C° , AC, FC) shall 
be in the least fixpoint of the program plus the input structure, iff the following condition holds: 

Property B. There exist extensions Y of Y and C° of C° to Att{Ts) and an extension FC of FC to 
FD{%), s.t. 

1. Y and C° form a partition of Att{Xs), 

2. V/ e FD{%) \ FD{s), if rhs{f) ^ f, then lhs{f) ^ Y. Moreover, FF = {/ e FD{s) \ rhs{f) ^ 
Y and lhs{f) D AU{%) ^ Y}. 

3. V/ e FC, / is consistent with the order on C°, i.e., V/ e FC: rhs{f) £ C° and V& £ lhs{f) D C°: 
b < rhs{f) holds. 

4. AC U C° \ Att{s) = {rhs{f) \ f G FC}, 

The main task of the program is the computation of all facts solve{s, Y, FY, C°, AC, FC) by means of a 
bottom-up traversal of the tree decomposition. The other predicates have the following meaning: 

• outside{FY, Y, At, Fd) is in the least fixpoint iff FY = {f ^ Fd \ rhs{f) ^ Y and lhs{f) r\ At % 
y }, i.e., for every / £ FY , rhs{f) is outside Y but this will never conflict with the closedness of Y 
because lhs{f) contains an attribute from outside Y . 

• consistent{FC , C°) is in the least fixpoint iff V/ £ FC we have rhs{f) e C° and Vfe £ lh.s{f)nC°: 
b < rhs{f), i.e., the FDs in FC are only used to derive greater attributes from smaller ones (plus 
attributes from 3^). 

• The fact unique{ACi, AC2, FC) is in the least fixpoint iff the condition ACi n AC2 = {b \ b = 
rhs{f) for some / £ FC } holds. The tinigue-predicate is only used in the body of the rule for branch 
nodes. Its purpose is to avoid that an attribute in i? \ 3^ is derived via two different FDs in the two 
subtrees at the child nodes of the branch node. 

• The 0-ary predicate success indicates if the fixed attribute a is prime in the schema encoded by the 
input structure. 

The PRIMALITY-program has the following properties. 

Lemma 5.2 The solve-predicate has the intended meaning described above, i.e., for all values s, Y, FY, 
C°, AC, FC, the ground fact solve{s,Y,FY,C° ,AC,FC) is in the least fixpoint of the PRIMALITY- 
program plus the input structure, iff Property B holds. 
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Proof Sketch. The lemma can be shown by structural induction on T. We restrict ourselves here to outlining 
the ideas underlying the various rules of the PRIMALITY-program. The induction itself is then obvious and 
therefore omitted. 

(1) leaf nodes. The rule for a leaf node s reahzes two "guesses" so to speak: (i) a partition of At{s) into 
Y and C° together with an ordering on C° and (ii) the subset FC C Fd{s) of FDs which are used in the 
derivation sequence of i? \ ^ from y U {a}. The remaining variables are thus fully determined: FY is 
determined via the owteide-predicate, while AC is determined via the equality AC = {rhs{f) \ f e FC}. 
Finally the body of the rule contains the checks consistent{FC , C°) and AC C C to make sure that (at 
least at the leaf node .s) the "guesses" are allowed. 

(2) attribute introduction node. The two rules are used to distinguish 2 cases whether the new attribute b is 
added to Y or to C°. If b is added to Y then all arguments of the so/ve-fact at the child node si of s remain 
unchanged at s. In contrast, if b is inserted into C° then the following actions are required: 

The atom consistent{FC , C° W {6}) makes sure that the rules in FG are consistent with the ordering 
of C", i.e., it must not happen that the new attribute b occurs in lhs{f) for some / e FC, s.t. b > rhs{f) 
holds. 

The new attribute b outside Y may possibly allow us to verify for some additional FDs that they do not 
contradict the closedness of y. The atom outside {FY2,Y, At, Fd) determines the set FY2 which contains 
all FDs with rhs{f) ^ Y but with some attribute from C° (in particular, the new attribute b) in lhs{f). 

Recall that we are requiring that, whenever an FD f € Fis contained in a bag of the tree decomposition, 
then the attribute rhs{f) is as well. Hence, since the attribute b has just been introduced on our bottom-up 
traversal of the tree decomposition, we can be sure that b does not occur on the right-hand side of any FD in 
the bag of .s. Thus, AC is not affected by the transition from si to s. 

(3) FD introduction node. The three rules distinguish, in total, 3 cases: First, does rhs{f) G F or rhs{f) € 
C° hold? (Recall that we assume that every bag containing some FD also contains the right-hand side of 
this FD.) The latter case is then further divided into the subcases if / is used for the derivation of i? \ 3^ or 
not. The first rule deals with the case rhs{f) G Y. Then all arguments of the solve-fact at the child node si 
of s remain unchanged at s. 

The second rule addresses the case that rhs{f) G C° and / is used for the derivation of i? \ 3^. Then the 
attribute rhs{f) is added to AC. The disjoint union makes sure that this attribute has not yet been derived 
by another rule with the same right-hand side. The atom consistent {F C , C° tt) {6}) is used to check the 
consistency of / with the ordering of C°. The atom outside{FY2, Y, At, Fd) is used to check if / may be 
added to FY, i.e., if some attribute in lhs{f) is in C°. 

The third rule refers to the case that rhs{f) G C and / is not used for the derivation of R\y. Again, 
the atom outside(FY2,Y, At, Fd) is used to check if / may be added to FY . 

(4) attribute removal node. The two rules are used to distinguish 2 cases whether the attribute b was in Y 
or in C°. If 6 was in Y then all arguments of the solve- faci at the child node si of s remain unchanged at 
s. In contrast, if b was in C° then we have to check (by pattern matching with the fact solve{si, . . . , AC W 
{6}, . . .)) that a rule / for deriving b has already been found. Recall that, on our bottom-up traversal of T, 
when we first encounter an attribute b, it is either added to Y or C. If 6 is added to C° then we eventually 
have to determine the FD by which b is derived. Hence, initially, b is in C° but not in AC. However, when 
b is finally removed from the bag then its derivation must have been verified. The arguments Y, FY, and 
FC are of course not affected by this attribute removal. 

(5) FD removal node. Similarly to the FD introduction node, we distinguish, in total, 3 cases. If rhs{f) G Y 
then all arguments of the soZve-fact at the child node si of s remain unchanged at s. If rhs{f) G C° then 
we further distinguish the subcases if / is used for the derivation of -R \ 3^ or not. The second and third 
rule refer two these two subcases. The action carried out by these two rules is the same, namely it has to 
be checked (by pattern matching with the fact solve{si, . . . , FY i±) {/}, . . .)) that / does not constitute a 
contradiction with the closedness of y. In other words, since rhs{f) € C°, we must have encountered (on 
our bottom-up traversal of T) an attribute in lhs{f) ^ y. 

(6) branch node. Recall that a branch node s and its two child nodes si and S2 have identical bags by our 
notion of normaUzed tree decompositions. The argument of the solve-fact at s is then determined from the 
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arguments at si and S2 as follows: The arguments Y and C° must have the same value at all three nodes s, 
si, and 52- Likewise, FC (containing the FDs from the bags at these nodes which are used in the derivation 
of i? \ 3^) must be identical. In contrast, FY and AC are obtained as the union of the corresponding 
arguments in the io/ve-facts at the child nodes si and S2, i e., it suffices to verify at one of the child nodes 
Si or S2 that some FD does not contradict the closedness of Y and that some attribute in C° is derived by 
some FD. 

Recall that we define an order on the attributes in i? \ 3^ by means of some derivation sequence of R\y 
from y U {a}. Hence, we we have to make sure that every attribute in i? \ 3^ is derived only once in this 
derivation sequence. In other words, for every 6 e i?\ {yu{a}), we use exactly one FD / with rhs{f ) = h 
in our derivation sequence. The atom unique{ACi, AC2, FC) in the rule body ensures that no attribute in 
R\y is derived via two different FDs in the two subtrees at the child nodes of the branch node. □ 

Theorem 5.3 The datalog program in Figure\6\decides the PRIMALITY problem for a fixed attribute a, i.e., 
the fact "success" is in the least fixpoint of this program plus the input Ttd-structure Atd iff Atd encodes 
a relational schema {R, F), s.t. a is part of a key. Moreover, for any schema {R, F) with treewidth w, 
the computation of the Ttd-structure Atd <^nd the evaluation of the program can be done in time 0{f{'w) * 
i^) I ) for some function f. 

Proof. By Lemma ls!2l the predicate solve indeed has the meaning according to Property B. Thus, the rule 
with head success reads as follows: success is in the least fixpoint, iff s denotes the root of T, a is an 
attribute in the bag at s, and Y is the projection of the desired attribute set y onto Att{s), i.e., (1) y is 
closed (this is ensured by the condition that {/ e Fd \ rhs{f) ^ Y} ~ FY), (2) a ^ y and, finally, 
(3) all attributes in i? \ (3^ U {a}) are indeed determined by 3^ U {a} (this is ensured by the condition 
AC = C° \ {a}). 

The linear time data complexity is due to the same argument as in the proof of Theorem 15.11 our 
program in Figure |6] is essentially a succinct representation of a quasi-guarded monadic datalog program. 
For instance, in the atom solve{s, Y, FY, C°, AC, FC), the (ordered) sets Y, FY, C°, AC, and FC are 
subsets of the bag of s. Hence, each combination Y, FY, C°, AC, FC could be represented by 5 subsets 
resp. tuples ri, . . . over {0, . . . ,w} referring to indices of elements in the bag of s. Recall that w is 
a fixed constant. Hence, solve{s, Y, FY, C°, AC, FC), is simply a succinct representation of constantly 
many monadic predicates of the form solve(^ri,...,rs) (s)- Ths quasi-guard in each rule can thus be any atom 
with argument s, e.g., bag{s. At, Fd) (possibly extended by a disjoint union with {b} or {/}, respectively). 
Thus, the linear time bound follows immediately from Theorem l4.4l □ 

5.3 The Primality Enumeration Problem 

In order to extend the Primality algorithm from the previous section to a monadic predicate selecting all 
prime attributes in a schema, a naive first attempt might look as follows: one can consider the tree decom- 
position T as rooted at various nodes, s.t. each a G i? is contained in the bag of one such root node. Then, 
for each a and corresponding tree decomposition T, we run the algorithm from Figure |6l Obviously, this 
method has quadratic time complexity w.rt. the data size. However, in this section, we describe a linear 
time algorithm. 

The idea of this algorithm is to implement a top-down traversal of the tree decomposition in addition 
to the bottom-up traversal realized by the program in Figure |6] For this purpose, we modify our notion 
of normalized tree decompositions in the following way: First, any tree decomposition can of course be 
transformed in such a way that every attribute a <E R occurs in at least one leaf node of T. Moreover, for 
every branch node s in the tree decomposition, we insert a new node u as new parent of s, s.t. u and s have 
identical bags. Hence, together with the two child nodes of s, each branch node is "surrounded" by three 
neighboring nodes with identical bags. It is thus guaranteed that a branch node always has two child nodes 
with identical bags, no matter where T is rooted. Moreover, this insertion of a new node also implies that 
the root node of T is not a branch node. 
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Table 1 : Processing Time in ms for PRIMALITY. 



We propose the following algorithm for computing a monadic predicate prime (), which selects precisely 
the prime attributes in {R, F). In addition to the predicate solve, whose meaning was described by Property 
B in Section 15.21 we also compute a predicate io/vej, whose meaning is described by replacing every 
occurrence of 7^ in Property B by Tg. As the notation solvei suggests, the computation of solvel can be 
done via a top-down traversal of T. Note that io/ve j(s, . . .) for a leaf node s of T is exactly the same as if 
we computed solve{s, . . .) for the tree rooted at s. Hence, we can define the predicate prime{) as follows. 



Program Monadic-PrimaHty 

prime{a) «- leaf{s), bag{s, At, Fd), a £ At, solvers, Y, FY, C°, AC, FC), a(j[Y, 
FY = {/ G Fd I rhs{f) Y}, AC = C° \ {a}. 



By the intended meaning of solvel ™d by the properties of the Primality algorithm in Section [Ol we 
immediately get the following result. 

Theorem 5.4 The monadic predicate primeQ as defined above selects precisely the prime attributes. More- 
over, it can be computed in linear time w.r.t. the size of the input structure. 

6 Implementation and Results 

To test our new datalog programs in terms of their scalability with a large number of attributes and rules, 
we have implemented the Primality program from Section fOl in C++. The experiments were conducted on 
Linux kernel 2.6.17 with an 1.60GHz Intel Pentium(M) processor and 512 MB of memory. We measured 
the processing time of the Primality program on different input parameters such as the number of attributes 
and the number of FDs. The treewidth in all the test cases was 3. 

Test data generation. Due to the lack of available test data, we generated a balanced normalized tree 
decomposition. Test data sets with increasing input parameters are then generated by expanding the tree in 
a depth-first style. We have ensured that all different kinds of nodes occur evenly in the tree decomposition. 

Experimental results. The outcome of the tests is shown in Tablefl] where tw stands for the treewidth; 
#Att, #FD, and #tn stand for the number of attributes, FDs, and tree nodes, respectively. The processing 
time (in ms) obtained with our C++ implementation following the monadic datalog program in Section l531 
are displayed in the column labelled "MD". The measurements nicely reflect an essentially linear increase 
of the processing time with the size of the input. Moreover, there is obviously no big "hidden" constant 
which would render the linearity useless. 
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In jT?), we proved the FPT of several non-monotonic reasoning problems via Courcelle's Theorem. 
Moreover, we also carried out some experiments with a prototype implementation using MONA (see ll22l ) 
for the MSO-model checking. We have now extended these experiments with MONA to the PRIMALITY 
problem. The time measurements of these experiments are shown in the last column of Table [T] Due to 
problems discussed in ifTTl . MONA does not ensure linear data complexity. Hence, all testes below line 3 of 
the table failed with "out-of-memory" errors. Moreover, also in cases where the exponential data complexity 
does not yet "hurt", our datalog approach outperforms the MSO-to-FTA approach by a factor of 1000 or 
even more. 

Optimizations. In our implementation, we have realized several optimizations, which are highlighted 
below. 

(1) Succinct representation by non-monadic datalog. As was mentioned in the proofs of the Theo- 
rems |5T| and |531 our datalog programs can be regarded as succinct representations of big monadic datalog 
programs. If all possible ground instances of our datalog rules had to be materialized, then we would end 
up with a ground program of the same size as with the equivalent monadic program. However, it turns out 
that the vast majority of possible instantiations is never computed since they are not "reachable" along the 
bottom-up computation. 

(2) General optimizations and lazy grounding. In principle, our implementation is based on the general 
idea of grounding followed by an evaluation of the ground program. This corresponds to the general tech- 
nique to ensure linear time data complexity, cf. Theorem 14.41 A further improvement is achieved by the 
natural idea of generating only those ground instances of rules which actually produce new facts. 

(3) Problem-specific optimizations of the non-monadic datalog programs. In the discussion below The- 
orem l5.ll we have already mentioned that the datalog programs presented in Section |5] incorporate several 
problem-specific optimizations. The underlying idea of these optimizations is that many transitions which 
are kept track of by the generic construction in the proof of Theorem l4.5l (and. likewise, in the MSO-to-FTA 
approach) will not lead to a solution anyway. Hence, they are omitted in our datalog programs right from 
the beginning. 

(4) Language extensions. As was mentioned in Section|5] we are using language constructs (in particular, 
for handling sets of attributes and FDs) which are not part of the datalog language. In principle, they could 
be realized in datalog. Nevertheless, we preferred an efficient implementation of these constructs directly 
on C-H- level. Further language extensions are conceivable and easy to realize. 

(5) Further improvements. We are planning to implement further improvements. For instance, we are 
currently applying a strict bottom-up intuition as we compute new facts solve {v, . . .). However, some top- 
down guidance in the style of magic sets so as not to compute all possible such facts at each level would 
be desirable. Note that ultimately, at the root, only facts fulfilling certain conditions (like a ^ Y, etc.) are 
needed in case that an attribute a is indeed prime. 

7 Conclusion 

In this work, we have proposed a new approach based on monadic datalog to tackle a big class of fixed- 
parameter tractable problems. Theoretically, we have shown that every MSO-definable unary query over 
finite structures with bounded treewidth is also definable in monadic datalog. In fact, the resulting program 
even lies in a particularly efficient fragment of monadic datalog. Practically, we have put this approach to 
work by applying it to the 3-Colorability problem and the PRIMALITY problem with bounded treewidth. 
The experimental results thus obtained look very promising. They underline that datalog with its potential 
for optimizations and its flexibility is clearly worth considering for this class of problems. 

Recall that the PRIMALITY problem is closely related to an important problem in the area of artificial 
intelligence, namely the relevance problem of propositional abduction (i.e., given a system description in 
form of a propositional clausal theory and observed symptoms, one has to decide if some hypothesis is part 
of a possible explanation of the symptoms). Indeed, if the clausal theory is restricted to definite Horn clauses 
and if we are only interested in minimal explanations, then the relevance problem is basically the same as the 
problem of deciding primality in a subschema R' C R. Extending our prime {) program (and, in particular, 
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the sofce ()-predicate) from Section|5]so as to test primality in a subschema is rather straightforward. On the 
other hand, extending such a program to abduction with arbitrary clausal theories (which is on the second 
level of the polynomial hierarchy, see 1101 ) is much more involved. A monadic datalog program solving the 
relevance problem also in this general case was presented in I.20J . 

Our datalog program in Section |5] was obtained by an ad hoc construction rather than via a generic 
transformation from MSO. Nevertheless, we are convinced that the idea of a bottom-up propagation of 
certain conditions is quite generally applicable. We are therefore planning to tackle many more problems, 
whose FPT was established via Courcelle's Theorem, with this new approach. We have already incorporated 
some optimizations into our implementation. Further improvements are on the way (in particular, further 
heuristics to prune irrelevant parts of the search space). 
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